home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Linux Cubed Series 7: Sunsite
/
Linux Cubed Series 7 - Sunsite Vol 1.iso
/
system
/
admin
/
linuxcon.000
/
linuxcon
/
linuxconf-1.6
/
translate
/
translat.sgml
< prev
next >
Wrap
SGML Document
|
1996-04-14
|
18KB
<!doctype linuxdoc system>
<article>
<title>Translation system for Linuxconf
<author>Introduction
<abstract>
Linuxconf is a large software component, full of menus, and dialogs.
To be easily translatable, all messages must be extracted from
the C++ source code and place into dictionaries which can be
translated efficiently.
A special set of tools has been designed to achieve this. They
are described here.
</abstract>
<sect>Introduction
<p>
This document describes both how the system works and how translators
can use it. It starts by explaining how programmers can use it
to produce translatable programs. The section "how to translate"
explains how translators can use this system to translate
Linuxconf or any programs written using this system.
<sect>Principles
<p>
To make programs easily translatable, all messages should be placed in
dictionaries. A dictionary is made of message entries. Each message
has a unique ID and a value. In the C++ source, programmers are
referring to those messages using the ID whenever they want to print
or say something.
Each time a programmer need a new message, he has to add it in the
message dictionary and reference it from the C++ source code. This is
how most system works (There are other translation system out there).
The system used by <em/Linuxconf/ is basically different. Messages
are defined in the <tt/C++/ source code and the dictionaries are
built by
scanning all <tt/C++/ source files. Messages are defined in the
<tt/C++/ code.
Programmers must provide and ID and a value for each message right
in the source code. This is much easier (or nicer) to do this
right in the source code than to go back and forth in the dictionary.
Furthermore, the programmer directly see the message definition
in the source. With other system, only the message ID is visible
in the source.
Using the magic of the <tt/C/ preprocessor, the message value is
not compiled in the object code at all. Seen this way, the translation
system used by <em/Linuxconf/ yield the same result as other system.
It is just nicer to use for programmers.
Lets describe how a programmer use the system.
<sect1>One dictionary per source directory
<p>
It is best to define one message dictionary per sub-project or
sub-directory. This is easier to manage and avoid ID name space
congestion. For each directory source of <em/Linuxconf/ you have
one "<tt/dic/" file and one "<tt/m/" file. Both file are produced
simply by doing
<tscreen><verb>
make msg
</verb></tscreen>
This command scans all <tt/C++/ source file of the current directory
and update the file <tt>../messages/sources/DIRECTORY.dic</tt> and
the file <tt/DIRECTORY.m/, where <tt/DIRECTORY/ is the name of the
current directory.
<tt/make msg/ use the <tt>../translate/msgscan</tt> utility to
scan the source. This utility looks for specific constructs in the
<tt/C++/ source file. Here they are.
<sect1>The <tt/MSG_U/ macro
<p>
The <tt/MSG_U/ macro defines a new message. It defines both its ID
and its value. This macro is usable anywhere a <tt/C++/ string would
be.
<tscreen><verb>
#include "prjfoo.m"
int foo()
{
printf (MSG_U(M_MSG1,"Entering function foo"));
}
</verb></tscreen>
MSG_U defines a single value. <tt/U/ stands for unilingual. It
only defines one value.
<sect1>The <tt/MSG_B/ macro
<p>
The <tt/MSG_B/ macro is like the <tt/MSG_U/ macro, except it
defines two values, allowing a programmer to code immediately
two languages at once. The <tt/B/ stands for bilingual. This
has not been used in the <em/Linuxconf/ project but has proven
effective for other projects.
<tscreen><verb>
#include "prjfoo.m"
int foo()
{
printf (MSG_U(M_MSG1
,"Entering function foo\n"));
,"DΘmarrage de la fonction foo\n"));
}
</verb></tscreen>
<sect1>The <tt/MSG_R/ macro
<p>
The <tt/MSG_R/ macro simply references an already defined
message. This message may have been defined in another source
file (of the same project). Like the other macros, <tt/MSG_R/ may
be used anywhere a <tt/C++/ string is.
<sect1>The <tt/MSG_VERSION/ macro
<p>
This macro has not been used so far. It would allow one programmer
to raise the version number of a dictionary, preventing older
application to use the newer potentially incompatible dictionary.
The msgclean utility also plays with the version number of the
dictionary. The <tt/MSG_VERSION/ macro is still a concept rather
than a useful addition. Stay tune...
<sect1>The magic of the <tt/MSG_/ macros
<p>
The <tt/MSG_/ macros perform two tasks. First, they are easily
spotted by the <tt/msgscan/ utility. The parsing is simple and reliable
even if the <tt/C++/ source code is not functional. Second, they
hide the retrieval mechanism (How the message value is retrieved from
the binary dictionary at runtime).
The msgscan utility produce the <tt/.m/ file which looks like this
for the simple example above.
<tscreen><verb>
FILE prjfoo.m:
extern const char **_dictionary_prjfoo;
#ifndef DICTIONNARY_REQUEST
#define DICTIONNARY_REQUEST \
const char **_dictionary_prjfoo;\
TRANSLATE_SYSTEM_REQ _dictionary_req_prjfoo\
("prjfoo",_dictionary_prjfoo,55,1);\
void dummy_dict_prjfoo(){}
#endif
#ifndef MSG_U
#define MSG_U(id,m) id
#define MSG_B(id,m,n) id
#define MSG_R(id) id
#endif
#define M_MSG1 _dictionary_prjfoo[0]
</verb></tscreen>
As you see, one global variable is created: <tt/_dictionary_prjfoo/.
A special macro <tt/DICTIONARY_REQUEST/ is defined. This macro
should be placed in one source of the project. It is generally
place in the file <tt/_dict.c/ presented later.
<sect>How to use it
<p>
To produce a translatable program, do the following
<itemize>
<item>Replace all string message with <tt/MSG_U/ or <tt/MSG_B/
macros, giving each message a unique <tt/ID/.
<item>include (#include) the <tt/.m/ file in each source file
using the <tt/MSG_x/ macros. This file is generally named
<tt/directory.m/ where directory is the name of the current
directory.
<item>Create a file <tt/_dict.c/. The content of this file
is shown below.
<item>Use "<tt/make msg/" to extract the messages. This produces/updates
the dictionary file <tt/directory.dic/ and produces the include file
<tt/directory.m/.
<item>Compile and link your program.
<item>Use "<tt/make msg.eng/" to produce the English binary dictionary.
The file produced should be placed where your program expects it.
</itemize>
We will now describe further the different steps involved.
<sect1>The <tt/make msg/ command and <tt/msgscan/ utility
<p>
The <tt/make msg/ command invokes the <tt/msgscan/ utility. This utility
scan a set of <tt/C/ or <tt/C++/ source file, updates a dictionary
file and produces one include file.
Here is the command use to update the dictionary of the sub-project
<tt/uucp/ of the <em/Linuxconf/ project.
<tscreen><verb>
../translate/msgscan uucp \
../messages/sources/uucp.dic uucp.m EF *.c
</verb></tscreen>
The first argument is the name of the dictionary. The second argument
is the path of the dictionary file. As you see, dictionary file are
kept in a single directory for all projects. They are seldom. This
eases the works of translators. The third argument is the path of
the include file, which is produced in the current directory.
The fourth argument is the letter tags used to identify messages
defined with the macro <tt/MSG_U/ and <tt/MSG_B/. Messages
defined with <tt/MSG_U/ will be tagged with the letter E (English)
and messages defined with <tt/MSG_B/ will be tagged with
<tt/E/ for the first value and <tt/F/ (French) for the second.
<sect1>The <tt/_dict.c/ file
<p>
It is good practice to place the DICTIONARY_REQUEST macro
in a file _dict.c. There is generally one such a file per
directory. Its contents is generally:
<tscreen><verb>
#include "this_directory.m"
#include <translat.h>
DICTIONARY_REQUEST
</verb></tscreen>
At least this dependency should be placed in your <tt/makefile/
<tscreen><verb>
_dict.o: _dict.c this_directory.m
</verb></tscreen>
This will ensure that each time you update your dictionary (and
the <tt/m/ header file), <tt/_dict.c/ will be recompile, ensuring
proper recording of the dictionary revision and number of message.
This will avoid executing a program with an obsolete
or incompatible binary dictionary.
Given that <tt/_dict.c/ is small, the compilation is pretty
short.
<sect1>The <tt/msgcomp/ utility
<p>
Once you have compiled and linked your program, you must "compiled"
all the dictionaries used in your program into one binary dictionary.
This is done by the <tt/msgcomp/ utility. Here is the command
used when doing "<tt/make msg.eng/" for the <em/Linuxconf/ project.
This produces the English binary dictionary.
<tscreen><verb>
../translate/msgcomp -p../messages/sources/ \
/tmp/linuxconf-msg-1.3.eng eE \
askrunlevel dialog dnsconf fstab \
misc main netconf mailconf uucp userconf
</verb></tscreen>
This commands take all dictionaries for sub-projects
<tt/askrunlevel dialog dnsconf fstab misc main netconf mailconf
uucp/ and <tt/userconf/ and produce a single binary dictionary.
The <tt/-p/ option tells msgcomp to look for those dic files (
askrunlevel.dic dialog.dic ...)
in the directory <tt>../messages/sources/</tt>.
The argument <tt>/tmp/linuxconf-msg-1.3.eng</tt> is the file to produce.
The argument <tt/eE/ instructs <tt/msgcomp/ to extract message's values with
the '<tt/e/' tag. If there is no such value for a given message, the
value with the '<tt/E/' tag will be used.
<sect2>Convention used for tags
<p>
Dictionary file contain the definition for all messages. Each messages
may have different values, identified by a tag letter. When messages
are extracted by msgscan, it is instructed to associate values with
given tags. By convention, we use upper case letter to identify
message's value extracted from the source code. Lower case value
are used by translators.
We assume here that programmers are bad writers. We let them give
their best shots for messages and we are allowed to override their
work without overwriting it. By giving precedence to '<tt/e/' tags
over '<tt/E/' we are saying that translators work override the
work of programmers, but we are not forcing the translators to
rewrite everything.
<sect1>The <tt/msgclean/ utility
<p>
The <tt/msgscan/ utility maintains dictionary. At some point
some messages may become obsolete (Unused in any source files). The
<tt/msgclean/ is used to clean messages without values in the
<tt/dic/ file.
For the <em/Linuxconf/ project, the <tt/make/ target <tt/msg.clean/
is defined for that purpose.
Be aware that applying msgclean on a dictionary file with obsolete
message has an important side effect. Some message being deleted, the
numbering of all following message will be changed. All source using
the <tt/m/ include file should be recompiled.
To avoid problems, the <tt/msgclean/ utility automaticly increases
the revision number of the dictionary. This prevents using a
dictionary with an incompatible program.
<sect>Usage restriction
<p>
The strategy used is mainly targeted at <tt/C++/ code. With some
restriction, it may be used for <tt/C/ code. Here are the main
feature that probably don't work with <tt/C/.
<descrip>
<tag/static initialization/
In <tt/C++/ one can write the following code.
<tscreen><verb>
static char *tb[]={
foo(1),foo(22)
};
</verb></tscreen>
where foo is a function. The <tt/C++/ compiler will generate
the proper code which will be probably called once. The <tt/MSG_U/
macro (and others) are not hiding function call, but are indeed
dynamic in some sens. <tt/C/ does not support this. Other
translation strategy based on dictionary do have the same
limitation though.
</descrip>
The example using the <tt/static char *tb[]/ is also causing a problem
in <tt/C++/ if the variable is declared outside of a function. The
problem appear because the "hidden" initialization code generated
by the compiler is called very early, often before <tt/main()/ is called.
Normally, the function <tt/translat_load()/ which bring the
dictionary in memory is called by <tt/main()/.
Fortunately, the current implementation, where <tt/_dictionary_system/
is a pointer will trigger a <tt/seg fault/ whenever this condition
is met. This fault will be trigger all the time, because all
initialization are called before main. The strategy is <em/safe/.
<sect>Recommend usage and convention
<p>
<sect1>Naming convention for message's ID
<p>
To help peoples who will translate your <em/Linuxconf/, I have used
a convention for the ID's name.
<descrip>
<tag/B_/
Buttons.
<tag/E_/
Error message start with this.
<tag/F_/
Field labels start with this.
<tag/I_/
Dialog introduction start with this.
<tag/M_/
All menu entries start with this prefix.
<tag/N_/
Notices and warning start with this.
<tag/P_/
When the user is prompted for a password, the message's ID
start with this.
<tag/Q_/
Identify a question (Generally a Yes/No prompt).
<tag/T_/
Dialog's title start with this.
<tag/X_/
All other messages which fit in no category.
</descrip>
<sect>How to translate
<p>
<sect1>Go simple
<p>
One way to translate is to go right in the <tt/.dic/ files and add
translations for each message using a different tag. Then use
the <tt/msgcomp/ utility to extract the proper definition.
At first, there is little problem doing this. The <tt/msgscan/ utility
read,update and save the <tt/.dic/ file, so your changes won't be
lost.
The problem come from the way software is developed. First we develop
and then, when it is stable, we translate. Doing so mean that we have
to walk all the <tt/.dic/ files to make sure our translation still
fit with the original messages (English version for example). Those
original messages may have changed.
A different scheme was chosen for <em/Linuxconf/.
<sect1>Organization of the <tt/messages/ directory
<p>
The <tt/messages/ directory contain one subdirectory per language
plus one <tt/sources/ directory. This directory contains all the
<tt/.dic/ files produced by scanning the <tt/C++/ source files.
These file are never hand edited.
Each other directory has a copy of those <tt/.dic/ files with
the proper translation. A special utility <tt/msgupd/ has been
created: it basicly compared all messages in the <tt/sources/
directory with messages in the translated directory. It compare
only one language (say the English version).
Mostly, <tt/msgupd/ will tell you
<itemize>
<item>Which messages are new.
<item>Which messages have changed (The English wording).
</itemize>
Using that information, you know exactly what you have to do to
keep your work in sync with the current release of <em/Linuxconf/.
<tt/msgupd/ will reorder the translated <tt/.dic/ file (Not the
one in the <tt/sources/ directory) so all messages which needed
work are at the beginning of the file. It also add a comment (<tt/.dic/
files may have comments like most normal <em/Unix/ configuration file)
explaining what have to be done.
If the English version of the message was changed, it will re tag the
version in the translated file and add the new version, plus a comment.
The old English message will have the tag "<tt/Z/". You can see
easily what is the change.
<sect1>The <tt/msgupd/ utility
<p>
The file <tt/rules.mak/ shows the rules for one translation (which
is not done yet). Look for the target <tt/msg.cfr/ and <tt/upd.cfr/.
To add a new language, do this
<itemize>
<item>Create a new directory empty in the <tt/messages/ directory, for
example, <tt/mar/ for <em/Alien language/.
<item>Customize <tt/rules.mak/ and add the target <tt/msg.mar/
and <tt/upd.mar/.
<item>Run the following command. This will fill the
<tt>messages/mar</tt> directory with all the necessary <tt/.dic/
files.
<tscreen><verb>
make upd.mar
</verb></tscreen>
<item>Go into <tt>messages/mar</tt> and edit each <tt/.dic/ file and
add the proper translation as needed.
<item>Run the following command to produce the binary dictionary
required to run <em/Linuxconf/.
<tscreen><verb>
make msg.mar
</verb></tscreen>
<item>Set the following environment variable and run <em/Linuxconf/.
<itemize>
<item>export LINUXCONF_LANG=mar
<item>export LINUXCONF_DICT=/tmp
This variable is optional. <em/Linuxconf/ will normally
look for its message dictionary in
<tt>/usr/lib/linuxconf</tt>. This variable override this.
The <tt/msg.*/ makefile's target generally produce
their output in /tmp. This is useful to test new messages
without breaking the current installation of <em/Linuxconf/.
Be aware that this mechanism only work if you execute
<em/Linuxconf/ as root. For security reason, a normal
user can't override the message dictionary of
<em/Linuxconf/ (Although he can select a different
language from <tt>/usr/lib/linuxconf</tt> if available).
</itemize>
</itemize>
<sect1>The <tt/msgcomp/ utility
<p>
The msgcomp utility has been tweaked to support the distribute
directory concept. Mainly it use the <tt/.dic/ file in the
<tt/sources/ directory as a reference. Message number ID are
defined from this file. It then used (optionally) alternative
<tt/.dic/ file to grab extra translations. The ordering of the
<tt/.dic/ file is unimportant.
<sect>Licensing
<p>
The <em/translate/ directory is part of the <em/Linuxconf/ project
but carry a special license. There is no restriction on usage. Feel
free to incorporate this system to any project.
This simple license does not apply to the rest of <em/Linuxconf/
which is covered by the standard GNU Copyleft license. See the
file <tt/LICENSE/ in the root directory.
If you find it useful for other project, send me a note and some
comments if possible.
</article>